Project Description¶
Welcome to New York City, one of the most visited cities in the world. Because many travelers come here, there are many Airbnb listings available for short or long stays. In this project, we will explore the New York Airbnb market by working with data stored in different file formats such as .csv
, .tsv
, and .xlsx
.
CSV, TSV, and Excel files are common ways to store data. We have three files containing information about Airbnb listings from 2019:
airbnb_price.csv
This CSV file contains details about listing prices and locations.listing_id
: a unique identifier for each listingprice
: the nightly price of the listing in US dollarsnbhood_full
: the name of the borough and neighborhood where the listing is located
airbnb_room_type.xlsx
This Excel file includes information about the listing descriptions and room types.listing_id
: a unique identifier for each listingdescription
: a text description of the listingroom_type
: Airbnb offers three room types — shared rooms, private rooms, and entire homes or apartments
airbnb_last_review.tsv
This TSV file holds data about hosts and the dates of their last reviews.listing_id
: a unique identifier for each listinghost_name
: the name of the host for the listinglast_review
: the date when the listing was last reviewed
Our goal is to practice importing, cleaning, and manipulating data from these different formats. We will then analyze and report insights that could help a real estate start-up better understand the Airbnb arket in New York City.
# Import necessary packages
import pandas as pd
import numpy as np
# Import CSV for prices
airbnb_price = pd.read_csv('airbnb_price.csv')
# Import Excel file for room types
airbnb_room_type = pd.read_excel('airbnb_room_type.xlsx')
# Import TSV for review dates
airbnb_last_review = pd.read_csv('airbnb_last_review.tsv', sep='\t')
# Join the three data frames together into one
listings = pd.merge(airbnb_price, airbnb_room_type, on='listing_id')
listings = pd.merge(listings, airbnb_last_review, on='listing_id')
listings.head()
listing_id | price | nbhood_full | description | room_type | host_name | last_review | |
---|---|---|---|---|---|---|---|
0 | 2595 | 225 dollars | Manhattan, Midtown | Skylit Midtown Castle | Entire home/apt | Jennifer | May 21 2019 |
1 | 3831 | 89 dollars | Brooklyn, Clinton Hill | Cozy Entire Floor of Brownstone | Entire home/apt | LisaRoxanne | July 05 2019 |
2 | 5099 | 200 dollars | Manhattan, Murray Hill | Large Cozy 1 BR Apartment In Midtown East | Entire home/apt | Chris | June 22 2019 |
3 | 5178 | 79 dollars | Manhattan, Hell's Kitchen | Large Furnished Room Near B'way | private room | Shunichi | June 24 2019 |
4 | 5238 | 150 dollars | Manhattan, Chinatown | Cute & Cozy Lower East Side 1 bdrm | Entire home/apt | Ben | June 09 2019 |
What are the dates of the earliest and most recent reviews?¶
# convert "last_review" to datetime type
listings['last_review_date'] = pd.to_datetime(listings['last_review'], format='%B %d %Y')
first_reviewed = listings['last_review_date'].min()
last_reviewed = listings['last_review_date'].max()
print(first_reviewed)
print(last_reviewed)
2019-01-01 00:00:00 2019-07-09 00:00:00
How many of the listings are private rooms?¶
# Since there are differences in capitalization, making capitalization consistent
listings['room_type'] = listings['room_type'].str.lower()
private_room_count = listings[listings['room_type'] == 'private room'].shape[0]
private_room_count
11356
What is the average listing price?¶
# To convert price to numeric, removing " dollars" from each value
listings['price_clean'] = listings['price'].str.replace(' dollars', '').astype(float)
avg_price = listings['price_clean'].mean()
avg_price
141.7779364512674
Combine the new variables into one DataFrame called review_dates
with four columns in the following order: first_reviewed
, last_reviewed
, nb_private_rooms
, and avg_price
. The DataFrame should only contain one row of values.¶
review_dates = pd.DataFrame({
'first_reviewed': [first_reviewed],
'last_reviewed': [last_reviewed],
'nb_private_rooms': [private_room_count],
'avg_price': [round(avg_price, 2)]
})
review_dates
first_reviewed | last_reviewed | nb_private_rooms | avg_price | |
---|---|---|---|---|
0 | 2019-01-01 | 2019-07-09 | 11356 | 141.78 |